arXiv:1503.00489v2 [math.ST] 16 Dec 2015 


Noname manuscript No. 

(will be inserted by the editor) 


Approximation and estimation of very small probabilities 
of multivariate extreme events 


Cees de Valk 


December 16, 2015 


Abstract This article discusses modelling of the tail of a multivariate distribution 
function by means of a large deviation principle (LDP), and its application to the 
estimation of the probability p n of a multivariate extreme event from a sample of n 
iid random vectors, with p n £ [n~ T2 ,n~ Tl ] for some ti > 1 and T2 > t\. One way 
to view classical tail limits is as limits of probability ratios. In contrast, the tail LDP 
provides asymptotic bounds or limits for log-probability ratios. After standardising the 
marginals to standard exponential, dependence is represented by a homogeneous rate 
function I. Furthermore, the tail LDP can be extended to represent both dependence 
and marginals, the latter implying marginal log-GW tail limits. A connection is estab¬ 
lished between the tail LDP and residual tail dependence (or hidden regular variation) 
and a recent extension of it. Under a smoothness assumption, they are implied by the 
tail LDP. Based on the tail LDP, a simple estimator for very small probabilities of 
extreme events is formulated. It avoids estimation of / by making use of its homogene¬ 
ity. Strong consistency in the sense of convergence of log-probability ratios is proven. 
Simulations and an application illustrate the difference between the classical approach 
and the LDP-based approach. 
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1 Introduction 

In this article, we will consider estimation of very small probabilities p n of multivariate 
extreme events from a sample of size n, with 

Pn € [n _T2 ,n _Tl ] with T 2 > n > 1, (1.1) 

motivated by applications requiring quantile estimates for p n <C 1/n in e.g. flood 
protection and more generally, natural hazard assessment, and in operational risk as¬ 
sessment for financial institutions. Multivariate events with such low probabilities are 
also relevant to these fields of application. Examples are breaching of a flood protection 
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consisting of multiple sections differing in exposure, design and maintenance along a 


shoreline or river bank (Steenbergen et al 20041), damage to an offshore structure 


caused by the combined effects of multiple environmental loads like water level, wave 


height, etc. (ISO (2005 l), and operational losses suffered by banks in different business 


lines and due to various types of events (Embrechts & Puccetti (2007)). 


Most work on estimation of probabilities of extreme events is based on the regularity 
assumption that the distribution function F is in the domain of attraction of some 
extreme value distribution function (de_ Haan & Ferreira[ p00 6|; R esnic k|(1987|). In 
the univariate case, this is equivalent to the generalised Pareto (GP) tail limit 


lim t(l — F(xw(t) + U(t)) = 1 /h- 

£—>•00 ' 


L (x) Vx G/i 7 ((0,oo)) 


for some positive function w , with U(t) := F 1 (1 — 1/t) and 


h 1 { A):= 


(A 7 - l )/7 if 7 0 


log A 


if 7 = 0 


( 1 . 2 ) 


(1.3) 


for some 7 £ IR, the extreme value index. I 11 the multivariate case, with F the distribu¬ 
tion function of a random vector X = (Xi,.., X m ) with continuous marginals F\ , ..,F m , 


it implies that each marginal satisfies the GP tail limit ( 1.2 I and that V : = (Vi, • Vm), 
the random vector with standard Pareto marginals with 


V/ :, (1-/0(A-,)) 


(1.4) 


for i = 1 , ..,m, satisfies 


lim tP (V £ tA) = v(A) 
£—>00 


(1.5) 


for every Borel set A C [0, oo) m such that inf x£j 4 max(ii,.., x m ) > 0 and v(dA) = 0, 
with v a measure satisfying v(Aa) = a _ 1 z^(A) for all these A and all a > 0. Based on the 
GP tail limit and the exponent measure v or its properties, estimators for probabilities 
have been formulated; e.g. Smith et al (19901, Coles & Tawn (|1990| 19941 


Joe et al 


(19921, Bruun & Tawn (19981, de Haan fe Sinha| ~( 19991, Drees & de Haan (20131. 


If the maxima of some components of X under consideration are asymptotically 
independent, these estimators may produce invalid results. To alleviate this problem, 
residual tail dependence (RTD), also known as hidden regular variation, was introduced 
as an additional regularity assumption on the tail of the multivariate survival function 


F c , defined by F c (x) = P(Xi > 07 Vi £ {1,.., m}); e.g. Ledford & Tawn (1996 


(2005). This model was recently extended in Wadsworth & Tawn (2013). Another 


1997 


19981, Peng ( 1999| ) , |Resnick| ( |2002 1 , Draisma et al. (2004 1 and Heffernan & Resnick 


approach, based on conditional limits, was proposed in 
|Heffernan fc Resnick| ( |2007| . 


Heffernan & Tawn 


(2004) and 


The first-order tail regularity conditions (1.2 ) and (1.51 can be seen as limiting rela¬ 
tions for probability ratios, so they only allow estimation of probabilities p n vanishing 
slowly enough, that is, 

Pn > \k n /n (1.6) 


for some A > 0 and some intermediate sequence ( k n ), and therefore, np n -> 00 as 
n —> 00 . For an iid sample, the empirical probability p n is an unbiased estimator for 
such p n , satisfying that p n /pn 1 (from the binomial distribution of np n ). There¬ 
fore, estimators for these p n which make use of tail regularity can at best achieve a 
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reduction in variance when compared to p n . To allow tail extrapolation to be car- 



Proposition 1. For example, they exclude the normal and the lognormal distribution, 
but also for all a £ (0,oo) \ {1} the distribution functions of Y a with Y exponentially 
distributed, and of exp((log V) a ) with V Pareto distributed. 

To overcome these limitations, we will consider a different approach in this paper. 
Rather than imposing additional assumptions on convergence beyond the first-order 


limits (1.21 and (1.51, we will attempt to replace them by different types of first-order 
limits more suitable for the probability range s . Suppose that (k n ) satisfies k n < n c 
for some c £ (0,1). Then (p n ) satisfying does not satisfy (|l.6[), but 


n < 


log Pn 

log (kn/n) 


< T2/(l — c) < 00. 


(1.7) 


This suggests that replacing the classical limits of probability ratios by limits of 
log-probability ratios could provide a framework for constructing estimators for prob¬ 


abilities of extreme events in the range (1.11. 


In the next section, we address the limiting behaviour of log-probability ratios in the 
univariate case as introduction to the multivariate case. We will find that this behaviour 
is described by a large deviation principle (LDP) (see e.g.jDembo & Zeitouni (1998])). 
It is generalised to the multivariate setting in Section [3] In Section [4] we establish a 
connection between the tail LDP and residual tail dependence and related assumptions. 
Section [5] returns to the basic LDP and applies it to formulate a simple estimator 
for probabilities of extreme events in the range and to prove its consistency. In 

Section [6j this estimator is compared to its classical analogues in simulations, and an 
application of the LDP-based estimator is presented as illustration. Section [7] closes 
with a discussion of the results and of outstanding issues. Readers primarily interested 
in tail dependence could scan Section 2 for the approach and background, read the first 


part of Section 3 until eq. (3.121, and then continue with Sections 4-7. Lemmas can be 
found in Section [S] 

The following notation is adopted: Id denotes the identity. The interior of a set S is 
denoted by S° and its closure by S. The image of a set S under a function / is written 
as f(S). The infimum of an (extended) real function / over S is written as inf f(S); 
by convention, inf{0} := oo. To avoid tedious repetition, expressions of the form a < 
liminfy_>.oo f(y) < limsup^^ f(y)<b are abbreviated to a < lim infy_>.oo f(y) < 
limsupj,^^ ... < b. 


1 Common a ssumptions are stro n g seco nd-order extended regular vari ation as in e.g. Theo¬ 
rem 4.3.1(1) of |de Haan fe Ferreira| ( 2006|) or the Hall class l|Hall| ([1982^). 
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2 Introducing the tail LDP: the univariate case 


We will begin by examining the univariate case in order to become acquainted with a 
particular type of large deviation principle (LDP) as a model of the tail of a distribution 
function. 

Let X be a real-valued random variable and let { b y , y > 0} be a family of real 
functions such that for D C [0,oo), b y [D) becomes more extreme in some sense when 


y is increased. In line with the classical limits ( e.g. (1.2)), we could consider an affine 


function for b y , i.e., b y (x) = r(y) + g(y)x, with r some nondecreasing function and g 
some measurable positive function. Instead, for a reason to be explained later, we will 
assume that F( 0) < 1 and consider 

b y (x) = r(y)e 9 ^ x (2.1) 

with g and r as above and r(oo) > 0. We will examine the limiting behaviour of 


- log P(X G by(D)) 


( 2 . 2 ) 


as y —> oo. Substituting y n = — log (k n /n) for y, this determines the behaviour of the 
log-probability ratio in (1.71 with p n = P(X G b Vn ( D )) asn-> oo. 


Generally speaking, normalised logarithms of probabilities like (2.2 I do not need 
to satisfy limits, so we will only assume thal[^] 

J(D°) < liminf ilogP(X G b y (D)) < limsup... < J(D) (2.3) 

y^°° y y—>oo 

for (at least) D = (x, oo) for all x > 0, with J some monotonic set function taking 
values in [0, oo]. Noting that p(x) := —J((x,o o)) is nondecreasing in x, we have at 
every continuity point x of ip in (0,oo), 

1 


lim — log(l — F(e 9 ^ x r(y))) = — <p(x). 
y -*oo y 


Let q be the left-continuous inverse of — log(l — F), so 
q := F -1 (l — e^ Id ) = U o exp . 


(2.4) 


(2.5) 


Assume that ip is not constant. By Lemma 1.1.1 of de Haan & Ferreira ( 20061 , ( 2.4 1 
implies limy-^oo (log q{y\) — log r(y))/g(y) = <p -1 (A) at every continuity point of the 
left-continuous inverse </? _1 of tp in (ip(0), ip(oo)). Therefore ( cf. the proof of Theorem 


1.1.3 in de Haan & Ferreira| (2006)), we may take r = q and choose g measurable and 


such that ip (A) = hg( A) for some real 9 (see (1.31). As a result, 


lim 

y—too 


logq(yA) - log q(y) 

9{V ) 


= h e { A) VA > 0 


and from \2A\ 


lim - log(l - F(e 9( ~ v ^ x q(y))) = -h e 1 {x) Vx G h e ((0, oo)). 
y-yoo y 


( 2 . 6 ) 


(2.7) 


Eq. ( |2.6[ ) states that logq is extended regularly varying with index 9. By ( |2.6[ ) , 
limy-^oo g{yX)/g(y) = A 0 for all A > 0, so g G RVg (g is regularly varying with index 
9)-, see Appendix B of|dc Haan & Ferreira: (2006). Now (2.31 can be fully specified: 


2 See the end of Section [l] for the notation employed here. 
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Proposition 1 (a) Suppose that asymptotic bounds (2.3) with (2.1\ ) apply to all D of 
the form D = ( x, oo) with x > 0, with J mon otonic and x i-> J((x, oo)) non-constant. 
Then g in .'2.1) can be chosen such that ^2. holds with r = q and J = — inf h(( 1 (Id) 
for some 9 £ 1R for every Borel set D C [0, oo), i.e., 


- inf hg 1 (D°) < lim inf i log P ^ 


log-Y - log q(y) 
9{y) 


£ D 


< limsup ... < — inf h 0 (D); 

y—t OO 

(b) Eq. (g.&P is equivalent to \2. (ty, which is equivalent to [K 


( 2 . 8 ) 


Proof We have proven that (2.31 for D = (a;,oo) implies the equivalent limit re¬ 


lations ( 2.6 1 and 12 . 71 ), so it remains to be shown that (2.71 implies ( 2.8 ) for ev¬ 


ery Borel set D C [0,oo). The lower bound holds if D° is empty. Else, with a : = 
inf h(f 1 (D°) > 0 and 5 > 0 such that (hg(a),hg(a + 5)] C D° and for every e £ 
(0,<5/2), P((log-Y - log q{y))/g{y) 6 D°) > F(e^ h ^ +S \(y)) - F{e^ h ^\{y)) 

> e y{a+e) _ e y(a+S e ) > e y(“+ E )(o v f _ e y< - s 2e )), provided that y is large 
enough, as a consequence of ( |2.7[ ). As 8 > 0 is arbitrary, this implies the lower bound 
in (2.8). The proof of the upper bound is similar and is therefore omitted. □ 


The pair of equivalent limit relations (|2.6|) and ( |2.7[ ) was named the log-Generalised 
Weibull (log-GW) tail limit in de Valk (f2014|), where it was proposed as a model for 


estimating high quantiles for probabilities in the range ( | 1 . 1 | ), as an alternative to the 
more familiar GP tail limit. If 6 = 0 and g(y) —> poo > 0 as y —¥ oo, it reduces to the 
Weibull tail limit; see e.g. |BroniatowskT| ( | 1993 | ) and |Kliippelberg| ( | 199 1 [ l . 

The log-GW tail limit looks deceptively similar to a GP tail limit, but it is a very 


different beast, primarily due to the logarithm in (2.71 (or equivalently, due to the 


exponent in ( |2.5| ). Its domain of attraction covers a wide range of tail weights: a class 
of light tails having finite endpoints, tails with Weibull limits (such as the normal 
distribution), all tails with classical Pareto tail limits and, more generally, with log- 
Weibull tail limits. For the latter, F o exp satisfies a Weibull tail limit; an example is 
the lognormal distribution. For estimation of high quantiles with probabilities <o> of 
distribution functions within the domain of attraction of the GP limit with 7 = 0, the 
log-GW tail limit offers a continuum of limits instead of just one; as a consequence, it 


is much more widely applicable (see de Valk 2014)). Readers more comfortable with 
classical tail limits may consider focusing on tails with a Pareto tail limit (7 > 0), 
which have a log-GW limit with 8=1 (so hg(X ) = A — 1) and g(y) = 7 y. This may 
make reading of the rest of th e art icle easier. 

An expression of the form (2.8) is an example of a large deviation principle^] (LDP); 
see Section 1.2 of Dembo & Zeitouni (1998 1 for a general background. The rate function 


of the LDP (2.8 1 is h„ r . The bounds provided by an LDP are crude; for example, they 


are unaffected by multiplying the probability in (2.8 1 by a positive number. One could 


see this as the price to be paid for approximating probabilities over a very wide range. 
More precise bounds may exist, but such cases should be regarded as the exception 


3 An LDP on a topological space 7” is an expression of the form (2.8 1 with 
P((logA — log q(y))/g(y) £ D) generalised to ji y (D). with [p. y , y > 0} some family of prob¬ 
ability measures on the Borel fr-algebra, and h( 1 generalised to some rate function (= lower 
semicontinuous function) 7; the expression is supposed to hold for every Borel set D in T■ 






































6 


Cees de Valk 


rather than the rule. Observe also that the bounds do not involve integration and in 


fact, most of D does not even matter to the values of the bounds. The LDP (2.8 1 
reduces to a limit only if D satisfies inf (_D°) = inf h^ 1 (D); such a D is called a 
continuity set of the rate function. 


Had we considered events of the form b y (x ) = r(y) + g(y)x instead of (2.11, then 
in the same way as above, we would have arrived at a different tail limit, the GW limit 


defined by replacing logg by q in (2.6 1 (see de Valk (20141). Its domain of attraction 
covers a much more limited range of tail weights. Furthermore, if P(0) < 1, then 
the GW limit implies a log-GW limit (c/. the proof of Lemma 3.5.1 in de Haan & 


Ferreira| (j2006j)). Therefore, to ensure that the results of this article are sufficiently 
widely applicable, we focus on the log-GW limit. 

The events considered in (2.8 1 with D C [0, oo) imply that X is in the intervaj^] 
[q(y),o o) for q(y) > 0. In a multivariate setting, it would be desirable to extend this 
interval to ]R, since a multivariate event could be extreme in one variable, but not in 
some other varia ble. This can be accomplished using a trick: define an approximation 
q y of q (see (2.5)) for y £ g _1 ((0,oo)) by 


q v {z) := 


9W 

q{y} 


>g(y)he{z/y) 


if z < y 
if z> y, 


(2.9) 


so for z > y, q y (z) is the log-GW tail approximation; for z < y, it is exact. 

A random variable Y with the standard exponential distribution satisfies 

— inf A° < liminf — log P(Y £ Ay) < limsup ... < — inf A 
y^-oo y y^roo 


( 2 . 10 ) 


for every Borel set A C [0, 00 ), which can be proven in a similar manner as Proposi¬ 
tion [l] If F is continuous, then Y = — log(l — F(X)) has the standard exponential 
distribution and q is increasing, so we can substitute P(A' £ q(Ay)) for P(Y £ Ay) in 
( 2.10[ ). Under the assumptions of Proposition [l] we can substitute P( X £ q y (Ay)) for 
P(Y £ Ay) in (2.101 as well, extending (|2.8[) to 


inf A° < liminf - logP(A' £ q y (Ay)) < limsup ... < — inf A : 
y~^°° y y—>oo 


( 2 . 11 ) 


Proposition 2 (a) If F is continuous, then (2.8) for every Borel set D C [0, 00 ), 
\2. 6|) and (2.7) are all equivalent to \2.11) for every Borel set A C [0,oo). 


Proof Equivalence of ( |2.6| ), ( |2.7[ ) and ( |2.8| ) follows from Proposition [IJb). If A° l~l 
(—oo,l) is nonempty, then P(X € q y (Ay)) > P(Y £ (A° n (— 00 , l))y) and the 
lower bound in ( |2. 1 1[ ) follows from (2.101. If not, then by (2.91, P(X £ q y (Ay)) > 
P(X £ q v (A°y)) = P((log X - log q(y))/g(y) € h e (A°)) with hg(A°) C [0,oo), so 
Proposition [l] implies the lower bound in ( [2TT| . The upper bound is proven similarly. 
To show that ( |2.1l| implies (2.7) for x £ /ig((l,oo)), take A = [A, 00 ) for A > 1; it can 
be extended to x £ 0,oo)) by a standard argument. □ 


When restricting A to [l,oo), (2.111 is equivalent to (2.8 1 for D = hY 1 (A). With 
A in [0,oo), therefore, (2.111 provides the intended generalisation of (2.8 1 . Note that 
the log-GW index 6 and auxiliary function g are now hidden in the approximation q y 


in (2.91. However, they are as essential in (2.111 as they are in the more explicit (2.8 1 . 


4 Depending on 9, we can extend this somewhat to [q(y)e c s(y), 00 ) for some c > 0; see I 2.7 1 
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3 Bounds and limits for probabilities of multivariate tail events 


For the univariate tail, we obtained the LDP (2.111 in a form which closely resembles 
(2.101 for the standard exponential distribution. This suggests that for a multivariate 
generalisation, we examine first the case of a random vector Y := (Yi,..,Y m ) with 
distribution function having standard exponential marginals. A straightforward multi¬ 
variate generalisation of the LDP ( |2.10| ) would be 

— inf I(A°) < lirninf — logPCY/y £ A) < lirnsup... < —inf 1(A) (3.1) 

y—*°° y y—^oo 

for every Borel set A C [0, oo) m , with I some rate function; we may regard ( |3.l| ) as the 
analogue of the classical expression ( |1.5| |. Further on, we will prove that ( |3.1| ) holds if 

I(x) := — inf liminf - log P(Y/y £ B e (x )) = — inf limsup - log P(Y/y £ B e (x)), 


e>0 y—>o o y 


£>0 y—^oo V 


(3.2) 


< e} the open ball of radius e > 0 with centre 


with B e (x) := {x' £ R m : 11a; — a 
x £ R m . For now, we turn to the rate function I, defined by (3.21 as some kind of 
limiting density, with the probability of an open ball replaced by its logarithm. Several 
properties of I follow immediately from 13.21) and the exponential marginals of Y. For 

{1,.., m}, ilog P(Y/y £ 


every e > 0 and x £ R m with Xj = A > 0 for some j £ 
B e (x)) < i log P(Yj/y > (A - e)) = e - A, so 


I{x) > 




\/x £ 


This implies that / is a good rate function, meaning that / 
every a £ [0,oo). Also, since B e (x A) = A B e /^(x), 

I(xX) = XI(x) VA > 0, x £ R' 


(3.3) 

-1 ([0, a]) is compact for 

(3.4) 


Furthermore, J(0) = 0, since < ye) > 1 — mP(Y\ > ey) = 1 — me ey in 

(3.21, and I{x) = oo whenever min(a:i,.. ,x m ) < 0. 

Remark 1 By (3.41, I(x) = g(x)I(x/ q{x)) for every x £ R m \ {0} and every norm g 
on R m . This gives for every norm a “spectral representation” of I, analogous to the 
spectral measures in classical extreme value theory ( e.g. de Haan & Ferreira (2006), 
Section 6.1.4). For example, in the bivariate case, the rate function can be represented 
on [0, oo) 2 \ {0} by I{x) = {x\ + X 2 )' 4 >{x 2 /(*i + * 2 )) with -i l>(t) := 1(1 — t,t) for 
t £ [0,1], so by (3.31, it satisfies i/)(t) > max(t, 1 — t) for all t £ [0,1]. The similarity 
of ip to the dependence function A of |Pickands| (1981) may be misleading, as a rate 
function defined by ( |3.2[ ) and a distribution function are very different objects. Besides 
satisfying A(t) > max(t, 1 — t) for all t £ [0,1], Pickands’ function A is convex, and 
A(l) = A(0) = 1. These latter conditions do not need to apply to ip. 


Example 1 Let X ~ A/”(0, V ) with V an m x m positive-definite matrix with unit 
diagonal; let W := Y -1 . Then 


I{x) Yj{1,...m} WjiyJXiXj , X £ [0, 00 ) 

In the bivariate case with V 12 = V 21 =: p, I(x) = (xi + X 2 — ‘2pyJx\X2 )/(1 — p 2 ), 
so ip(t) = (1 — 2p^/t(l — t))/( 1 — p 2 ). If p > 0, then I is convex and therefore, ip is 
convex. Figure[3]l shows contour plots of I for p = 0.8 (left) and p = 0.2 (middle). On 
the right, the function ip is plotted for these two values of p; for both, ip(l) = V'(O) > 1. 
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Fig. 3.1 Left and middle: contours of the rate function I (drawn) and function k (dotted; 
see Section[4| for the bivariate normal distribution with standard marginals and p = 0.8 (left) 
and p = 0.2 (middle). Right: function iji (see text) for p = 0.8 (drawn) and p = 0.2 (dashed). 


Remark 2 If liniy-xx) P(minj = i t .. im Yj > y)/P(Y\ > y) > 0, then 1(1) = 1 with 
1 = (1,.., 1); the converse is not true. In the bivariate case, 1(1) = 1 implies that 
if(\) = | for tjj in Remark [l] It does not fix fi(t) at other t G [0,1], For example, for 
the bivariate normal distribution with standard marginals (see Example [T| and with 
pm 1, V’(t) = oo for all t G [0,1] \ {f}, but for a positive mixture of this distribution 
function with a similar one with p = r < 1, ip(t) = (1 — 2 r\J t(l — t))/( 1 — r 2 ) for all 

* G [ 0 , 1 ] \ {§}. 


Remark 3 If I is subadditive, then by (3.41, it is convex, and furthermore, by (3.31, it 
is a norm. Again, this condition does not need to be satisfied in general. 


Since P(\\ > ya) < -P(||'F|| 00 > ya) < mP(Y i > ya), 


lim ilogP(||y|| 

y->oo y 


, /y > a) = — a Mot > 0, 


(3.5) 


so the tail of the maximum of Yi,..,Y m satisfies the same limit relation as the tails 
of each of Yi,.., Y m individually. This implies that the family of probability measures 


corresponding to the random variables {Y/y, y > 0} is exponentially tight (Dembo & 
Zeitouni 19981): for every a < oo, a compact E a C R m exists such that 


lim sup - log P(Y/y GJ?„) < —a, 

y—> oo V 


(3.6) 

< a + e} for some e > 0. 


which follows from (3.51 when taking E a = {iG R m : ||s 
As a consequence, 

Theorem 1 If the random vector Y := (Yi,.., Y m ) with standard exponential marginals 
satisfies 13.2), then it satisfies (3.1 ) for all Borel A C [0, oo) m with good rate function 
I satisfying I3.f), 7(0) = 0 and the marginal condition 


inf I(x) = \ MX > 0, j G {1, m}. 

£cGK m : Xj~>\ 


(3.7) 


Proof By Theorem 4.1.11 in Dembo & Zeitouni (19981, (3.21 implies the weak LDP, 
i.e., the lower bound in (3.11 holds for all Borel A, and the upper bound of (3.11 holds 
for all compact A. Because of exponential tightness ( prst, this implies the LDP ( |3.1| ); 
see Lemma 1.2.18 in Dembo & Zeitouni (1998). Then (3.71 follows from (3.11 and the 


exponential marginals of Y. □ 
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Remark 4 (3.31 is implied by (3.TI. 


For a continuity set of I satisfying that inf 7(A) = inf I(A°), the bounds in (3.11 reduce 
to a limit: 

lim - log P(Y £ Ay) = - inf 1(A) VA > 0. (3.8) 

y— too y 

A sufficient condition f or a set A to be a continuity set of I is that 7 is continuous 
and A C A°. Homogeneity (3.41 of 7 allows us to relax this condition: without assuming 


continuity of 7, A is a continuity set if inf 7(A) = I(x) for some x £ A° n U a >o(AA°) 
(U a >o(AA°) is the smallest cone containing A°). A bivariate example is sketched in 
Figure K3|2. Let A° be the grey set; if 7 attains its infimum over A on the part of its 
boundary drawn as a fat line (excluding the points indicated by circles), then A is a 
continuity set. In the remainder of this article, we will discuss continuity sets of rate 
functions without considering the particular conditions which make them so. 



Fig. 3.2 Illustration of a continuity set of 7 (see main text) 


It is straightforward to extend Theorem [I] to a random vector X with a distribution 
function F having continuous marginals Fi 5 ■ • 5 Fyn- As in (12.51), let for 7=1, ..,m, 


qi -.= F-\ l-e~ Id ) 

and for every x £ [0, oo) m , 

Q(x) := (qi(xi), ..,qm(x m ))- 
Let Y := (Y ±,.., Y m ) with for j = 1,.., m , 

Yj :=-l°g(l ~Fj(Xj)), 


(3.9) 


(3.10) 


(3.11) 


so Y = Q _1 (X). Because F\,..,F m are continuous, Y has exponential marginals. 
Almost surely, X = Q(Y) with Q defined by (3.10) and (3.91. Since Q is injective, 
P(X £ Q(yA)) = P(Y £ yA), so (3.11 is equivalent to 

— inf 7(A°) < l i m inf-log P(Y £ Q(yA)) < lim sup... < - inf 7(A). (3.12) 

V y->oo 

Having obtained a multivariate version of ( |2. 10 ), we are now ready to generalise 
the univariate tail LDP (2.8) and its extension (2.11 1 to the multivariate context. 
Concerning the latter, one would expect its multivariate generalisation to be like (3.121, 
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with Q replaced by an approximation. Let Fi,F m . satisfy log-GW tail limits with 
scaling functions gi,g m and log-GW indices 8 1 , 9 m , respectively. As in (2.91, define 
marginal quantile approximations 


Qj,v( z ) ■= 


9j( z ) 

q 3 (y)e 9 i {v)he i {z/v) 


if z < y 
if z > y 


(3.13) 


for y £ 1 ((0,oo)), and let for all x = (xi,..,x m ) G [0,oo) m , 

Qy( x ) := (9l,3/( a 'l)i Qm,y{xm))- (3-14) 

Theorem 2 Let the random vector X = (AT,.., AT™) have distribution function F 
with continuous marginals F\, ..,F m having positive endpoints. 

(a) If Y defined by 3.11 ) satisfies , 3.2) and the marginals satisfy log-GW tail limits 
12.6 1) with q = qj, g = gj and 8 = 8j for j = 1 then X satisfies 


— inf I(A °) < liminf — log P(X £ Q y (yA)) < limsup ... < — inf 1(A) 
y^°° y y—t oo 


(3.15) 


for every Borel set A C [0,oo) m , with Q y given 
rate function sati sfying l3.4\), (3.7\) and 1(0 ) = 0. 


3.13) and 3.14), and I a good 


(b) If X satisfies (3.15) for every Borel set A C [0, oo) m with Q y given by '3.13) and 
\3.14y and with rate function I satisfyin g \3. 7| ), then the marginals satisfy log-GW tail 
limits, and Y defined by \3.11\) satisfies l[3.1\) with good rate function I satisfying 
fO and 1(0) = 0. 


The proof can be found in Subsection |8.1| 

Remark 5 This theorem justifies viewing (13.lh as representation of tail dependence 


within the context of the LDP (3.151, which also represents the marginal tails. The 
relationship between the LDPs (3.151 and (3.11 is the large deviations analogue of 
a similar relationship in classical extreme value theory; compare e.g. Resnick (19871, 
Propositions 5.10 and 5.15. 

From the multivariate generalisation of ( |2.11[ ), we can now also derive a multivariate 
version of (2.8 1 , equivalent to the restriction of (3.151 to A C [l,oo) m : 

Corollary 1 Let 8 := (8i,..,8m) and Hg(z) := (hg 1 ( 21 ),.., hg m (z m )) for all z £ 
(0,oo) m . Then (3.15) implies for every Borel set D £ [0,oo) m : 

-inf I(Hg\D°)) 


< lim inf - log P 
y—>00 y 


Proof See Subsection |8.1 


log A'l - log q\ (y) log X m - log q m (y) 


9i{y) 


9m(y) 


£ D 


(3.16) 


< limsup ... < — inf I(H g (D)). 

y—t 00 


Note that (3.161 only addresses events within (qi(y), 00 ) x .. x (q m (y),oo), which 
is “covered” by all marginal log-GW tail approximations simultaneously. Just as ( |2.8| ), 
it can be extended somewhat. However, the main interest of (3.161 is that it shows the 
multivariate tail LDP explicitly as a pair of asymptotic bounds for the probabilities of 
extreme events defined in terms of affinely normalised logarithms of the components 
of X. For applications in statistics, (3.151 should be more useful, as it applies also to 
events which are not simultaneously extreme in every component of X. 
































Approximation/estimation of very small probabilities of multivariate extreme events 


11 


4 A connection to residual tail dependence and related models 

In this section, we digress from the main storyline to examine an interesting connection 
between the theory of Section [ 3 ] and earlier work on residual tail dependence (RTD) or 
hidden regular variation, introduced in Ledford & Tawn (1996 1997 19981 and studied 


in depth in [Resnic k (2002), amongst others. In the bivariate case, RTD offers a model of 
tail dependence within the classical domain of asymptotic independence of component¬ 


wise maxima ( e.g. de Haan & Ferreira (20061, Section 7.6). For a random vector X on 


R m with continuous marginals F\, defining the random vector V '■ = (Vi,.., V m ) 

with standard Pareto-distributed variables by ( |3.11| ), one way to describe RTD is that 
for some positive function S on (0, oo) m , 


lim 


P(Vj > txj Mj £ {1, ..,m}) 


=: S(x) > 0 


(4.1) 


t-t 00 P(Vj > tVj £ {1,.., m}) 
for all x £ (0,oo) m . The limiting function S satisfies S(l) = 1, with 1 the vector in 


R m with all its components equal to 1. Furthermore, the denominator in (4.1 1 must 
be regularly varying, so S (IX) = A for all A > 0 with 77 £ (0,1] the residual 
dependence index , and by ( |4.1| ), 

(4.2) 


S(x\) = A 1 ^ v S(x) Mx £ (0, oo) m , A > 0. 


Every regularly varying function / £ RV a can be represented as 

£ / \ ( \ f y a(t)t ~ 1 dt 

f{y) = c(y)e J yo y J 


(4.3) 


with c(y) —> cq > 0 and a(y ) —> a as y —> 00 . A minor strengthening of regular variation 


is that / satisfies the Von Mises condition (see e.g. Proposition 1.15 of Resnick (1987)) 


which means that c in (4.31 can be taken equal to a positive number cq; it implies 


that / is differentiable with derivative f (y) = a(y)f(y)/y. Note that whenever the 
LDP (3.11 holds and inf 1(A) £ (0, 00 ) for a particular Borel set A, then the function 
(y 1 — y — log P(Y/y £ A)) is in RV 1 . Therefore, within the context of the LDP (3.1l, 
the statement that (y >->■ — log P(Y/y £ A)) satisfies the Von Mises condition makes 


sense as a smoothness condition. The following relates RTD to the tail LDP (3.11. 


Proposition 3 (a) RTD . implies 


lim - \ogP(Y/y £ (A, oo) m ) = — A /77 VA > 0, 

77—700 y 


with p the residual dependence index of X. 

(h) If X satisfies an LDP \3.l]j with the function (y 


(4.4) 


_ —log P(Y/y £ (l,oo) m )) 

satisfying the Von Mises condition, then (f.l) holds for x = 1 A for all A > 0 with 
S(1X) = A“ 1/,? and p = 1/1(1). 

Proof Define Y A := min^g^ Yj, and let H A be the distribution function of Y A . 
By (4.11, the survival function 1 — H A o log of the random variable exp Y A is regularly 
varying with index —1/p. Therefore, / := 1/(1 — H A o log) £ PV[i/rj}i so by the 


Potter bounds (B ingham et al. (19871), for every e £ ( 0 , 1 / 77 ), there is z e > 0 such 
that (1 — e)(x/ z)^ v ~ < f(x)/f(z) < (1 + e)(x/z) 1 ^ rl+e for all z > z e and x > z. 

Taking logarithms and substituting e yX for x gives limy^oo V ~ 1 log f(e y ^) —7 A /77 for 


all A > 0, so (4.41 follows. For (b), note that due to (3.41, the LDP (3.1|) implies (4.41 
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with y = 1/1(1), so w(y) := — log(l — H^(y)) ~ y/y as y —¥ oo. Therefore, since w 
satisfies the Von Mises condition, w' (y) —¥ 1/y an d by averaging, w(y+r) — w(y) —¥ r/y 
as y —y oo for every r £ R. This is equivalent to ( |4.l| l for x = IX with S(1 A) = A 1//r > 
for every A > 0. □ 

Proposition [3] shows that RTD implies a limited LDP-like condition and in turn, 
the LDP (|3. 1|) with an additional smoothness condition implies an RTD-like condition. 


Example 2 The bivariate normal X of Example [T] satisfies the conditions for Proposi¬ 
tion with 1(1) = 2/(1 + p). Indeed, ( |4. l[ ) holds for x = 1X for all A > 0. with 
5(1 A) = A 1 /’ n and y = (1 + p)/ 2; see Example Class 2(1) in 


Ledford & Tawn 


(1996). 


If limj-^oo t 1 P(Vj > t Vj £ {l,..,m}) = 0, then there is a discrepancy between 


the “hidden” regularity of the survival function in (0, oo) m described by (4.1 1 and the 
regularity of the marginals. In contrast, the LDP provides a single consistent 

description of the multivariate tail which includes the marginal tails. Furthermore, 
the next theorem shows that under a smoothness assumption similar to the one in 
Proposition [3])b), the LDP (3.11 implies a useful extension of RTD. Let for all a £ R m , 


Aa '• — { X £ 


Xj > aj Vj G {1,.., m}}. 


(4.5) 


Theorem 3 (a) Assume that the LDP (3.1]) applies. To any Borel set A C R m which is 
a continuity set of I with (y i-» — log P(Y/y G 4)) satisfying the Von Mises condition, 
the following limit relation applies: 


lim p (Y G Alog(fA)) = A _inf/(A) 
t->oo P(Y £ Alogt) 


VA > 0, 


(4.6) 


with I satisfying (3.4\/ , l[3. 7 and 1(0) = 0. In particular, for every a G [0, oo) m such 
that the function (y i-> — logP(Yj > yaj Vj G {l,..,m}) satisfies the Von Mises 
condition, 


j. P(Vj>(tX) a ^j £ m}) = inf/(Aa) 

t —Voo P(Vj > t a i Vj £ {l,..,m}) 


VA > 0. 


(4.7) 


(b) Eq. (f.6) with mi 1(A) £ (0, oo) implies (3. 


Proof For A a continuity set of /, CED implies ( |3.8[ ), and ( |4.6[ ) is obtained in the same 
manner as in the proof of Proposition [3])b). In particular, A a is a continuity set of I 
for every a £ [0,oo) m . Therefore, substituting A a for A in (4.6 1 , we obtain (|4.7[). This 


proves (a). For (b), note that / := (f A 1/P(Y G ^41ogf)) G RLi n f/(A)- Therefore. 

>oo y 

A > 1, which implies (3.8 1 . 


just as in the proof of Proposition [3[a), limy-^oo y 1 log/(e yA ) —> Ainf/(A) for all 


□ 


Combining (a) and (b) in Theorem [3] we see that under the Von Mises condition 
(for A a Borel continuity set of I), the limit relation (4.6 1 for a probability ratio, and 


the limit relation (3.8 1 of the normalised logarithm of a probability are equivalent. 


In the special case of a = 1, (4.71 becomes equivalent to (|4.l|) with x = IX and 


y = 1/1(1), so on the diagonal, (4.71 and RTD (4.1) agree; elsewhere, they differ. 
Defining a function k by n(a) := inf/(4 a ) for every a £ [0, oo) m , (4.71 becomes 


identical to an extension of RTD recently introduced in Wadsworth & Tawn (2013). 
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Wadsworth & Tawn (2013) proposed this assumption to close the possible gap between 


(4.11 and the regularity of the marginal tails. It is curious that this condition, requiring 


the existence of separate limits of the survival function along chosen paths, is derivable 


from the simple LDP (3.11. 


The generalisation of (4.71 with inf I(A a ) replaced by re(a) to the apparently new 
limit relation (|4.6| is not trivial. Another generalisation, proposed in Wadsworth & 
( 20131 ), is 

1im P( Y & B + a log(f A)) _ ^- k (q) 
t->oo P(Y £ B + alogt) 


Tawn 


(4.8) 


derived in Section 3.3 of Wadsworth & Tawn j2013j) for the bivariate case under the 
assumption that re is differentiable and a £ [0, oo) m \ {0} satisfies dK,(a)/daj > 0 for 
j = 1 , ,.,m. As noted in Wadsworth & Tawn (2013), a in (4.8 1 would have to be chosen 
in an application. This would be no problem if the choice did not matter. However, 
the limiting behaviour of the probability of the event Y £ B + alogt as t — ► oo is 


determined by a in (4.8 1 ; not by B. Therefore, for estimating probabilities of extreme 


events, (4.6 1 seems more promising than the local limits (4.8 1 for chosen a. 

In (4.6 1 , it is not re, but the rate function / which determines the attenuation 


rate. For any a £ [0, oo) m , 1(a) and re(a) are identical only if I (a + x) > 1(a) for all 
x £ [0, oo) m . This condition is rather restrictive, as a rate function resembles a density 
more than it resembles a survival function; see definition 13.21) . 


Example 3 As an illustration, let X be bivariate normal with correlation coefficient p 
as in Example [I] (Section [3]). By Example l(a,b) in Table 1 of Wadsworth fc Tawn| 
(2013), k(x) = I(x) if min(xi/a: 2 , X 2 /X 2 ) > p 2 or if p < 0 and min(a;i,:E 2 ) > 0, and 
k(x) = max(ii,i 2 ) for all other x £ [0, oo) 2 . The left and middle panels of Figure[3]l 
display contours of re overlaying the contours of I for p = 0.8 and p = 0.2. For p = 0.2, 
contours of re largely overlap with those of 1; for p = 0.8, there are wide zones where 
the contours of re. and I differ. 


5 A simple estimator for very small probabilities 


We are now going to apply the theory of Section[3]to the problem of estimation of prob¬ 
abilities of extreme events p n satisfying 0 from XW,..,I W , with X^\X^ 2 \ ... 
a sequence of iid copies of a random vector X in R m with distribution function F 
having continuous marginals F\ ,.., P m - Denoting the underlying probability space as 
(f2,F,V), let T n C T be the a-algebra generated by X*- 1 ), ...,X^. 

Generalising 0 to 72 > ti > 0, consider events of the form B n := Q(A\ogn) 
with A C [0, oo) m and Q given by (3.101. Suppose that the tail LDP (3.1l applies. Then 
for every Borel set A C [0, oo) m which is a continuity set of / satisfying that inf 1(A) £ 
(0,oo), we have: —logP(X £ B n ) = — logP(T £ Alogn) ~ (logn) inf 1(A) and 
— log P(Q(Y/i) £ B n ) = — log P(Y £ At logn) ~ £(logn) inf 1(A) for all t > 0, so 


log P(X £ Bn) ~ i- 1 log P(Q(Y/t) £ Bn) V£ > 0. 


(5.1) 


This suggests estimating the left-hand side of ( |5.1| ) by replacing Q on the right- 
hand side by an estimator Q n and Y by an estimator Y n , and then choosing £ small 
enough that P(Q n (Y n /£) £ B n ) can be estimated directly from the data by counting. 

Estimation of Q boils down to a univariate quantile estimation problem, so we will 
proceed to examine this first. Assume that every marginal satisfies a log-GW tail limit 
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(i.e., the univariate tail LDP, see Section^. Let Xj t \ :n < ... < Xj tTl:n be the marginal 
order statistics derived from the marginal sample x( 1 1 X^. For some intermediate 
sequence ( k n ) and for n large enough tha t Xj J l _ kn+1:r t > 0 for j — li m , define the 
following estimator < 2 y n for qj (compare (3.13l): 


9j,n( z ) : = 


^j, L n (l—e -z )J-j-l:n A Z £ [ 0 ; Vn\ 

x j>n _ kn+1:n exp [gj, n hg . (z/y n )) if z > y n 


(5.2) 


with 


yn := log(n/fc n ). (5.3) 

For z > y n , qj, n (z) follows a log-GW tail with 9j, n and gj n estimators for 9j and 
gj in 13.131, respectively; for other z, the empirical quantile is used as estimator. The 
only assumption we will make on the quantile estimator is that the probability-based 
quantile estimation error Vj n , defined by 


lQg(l - Fjfcnjz))) _ -1-1- ( ) _ 1 

hn(Z) • log(l -Fj(qj(z))) qj ' n{ ) 


for z > 0, satisfies 


lim sup \0j n (ynX)\ = 0 a.s. VA > 1, j = 1 
n_>00 Ae[i,A] 


(5.4) 


(5.5) 


Valk (2014). Let 


de 


Estimators 9 n j and g n j in (5.2 I satisfying this requirement were considered in 

Qn(x) : = (qi,n(%l), qm,n(xm)) (5.6) 


for every x G [0, oo) m . Define the following estimator for := — log(l — F(Xj l> )): 


r(i 


y g ^-log(l -(Rf^-D/n) 


?(i) 


(5.7) 


with Ryl := £” =1 < xy>) the marginal rank of A'] 0 . 

For every n-tuple of events C n := (C-ri\ ..,C^) satisfying C„ ^ G T n for i = 1,.., n, 
dehne the “empirical probability” p n (C n ) := uj 1 — > p n (C n )(uj ) on 12 by 


Pn 


(C n )(u):—n X ^1 (weci 1 '). 


(5.8) 


For some £ > 0, determine a value of the analogue of £ in ( |5.l[ ) as 
£+(B) := sup{Z > 0 : Pn(Qn(Yn/l) G B) > (fc„/n) e }, 


(5.9) 


with sup{0} := 0 and with p n (Qn(Y n /l ) G B) = l(Q n (Yn^ /l) G B) in 

accordance with (5.8 1 . 

Let 7T denote the probability measure corresponding to F. Now consider the fol¬ 
lowing estimator for 7 r(B) := P(X G B)\ 


nh(B) := (fc n /n) €/ ^ (B) . 


(5.10) 


If B n ,r ■ = Q(Arlogn) is substituted for B , then under mild restrictions on A and 
(k n ), this estimator converges in the large deviation sense for all r > 0: 
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Theorem 4 Let X^\ X^ 2 \... be iid copies of a random vector X on satisfying 
the conditions of Theorem^ a), including continuous marginals satisfying log-GW tail 
limits. For a sequence (k n ) satisfying 


0 < c := lint inf < Hmsup log ^ rl 


n-¥ oo logn 


logn 


=: c < 1, 


(5.11) 


consider the estimator \5.1Cfy for P(X £ B ), with the quantile estimator J[5.2fy satisfying 


5.5) and with £ £ (0, (1 — c') ). Then for B n ,r ■= Q(Arlogn), with A C [0,oo)^ 


any Borel set which is a continuity set of I defined by \3.2ty and 3.11 ) and satisfies 
inf 1(A) £ (0, oo), 


lim sup 

n—tco i^] 


logTT n(Bn, T ) 


log P(X £ B n , r ) 


- 1 


= 0 a.s. VT > 1. 


(5.12) 


The proof can be found in Subsection |8. 3 


Remark 6 By ( |3.12| ), as inf I(A°) = inf 1(A), P(X £ B„ :T ) = n -r inf/(A)(l+o(l)) 
(5.121, so the probability range (JTTTj) is covered by Theorem [4] 


In practice, computing or approximating (5.91 may not be easy; for example, in en¬ 
gineering applications, it may involve running a complex numerica l mod el for every 
datapoint. Therefore, it would be an advantage to replace in(B) in (5.101 by an arbi¬ 
trary value in some suitable interval. Define for some d £ (0, £] 


£~(B) := sup{Z > 0 : Pn(Qn(Y n /l) £ B) > (k n /nf}. 


(5.13) 


Then £ n (B) < £f[(B). Let £ n (B) be the result of an algorithm designed to satisfy 


£n(B) £ [£~(B),£+(B)\- 


(5.14) 


for the present analysis, it is sufficient to assume that £ n (B) is a random variable 
satisfying ( |5.14[ ) . Now consider the following generalisation of the estimator ( |5.10| ) for 
t t(B) := P(X £ B): 


: (B) := (pn(Qn(Yn/£n(B)) £ S)) 


1/MB) 


(5.15) 


Theorem 5 For X^\X^ 2 \ ..., (k n ) and c’ as in Theorem Q consider the estimator 
\5.15 ) for P(X £ B), with the quantile estimator (5.2) satisfying (5.5) and with £ £ 
(0, (1 — c 7 ) -1 ) and d £ (0,£]. Then for B n ,r as in Theorem Q 


lim sup 

n—>-o° 


log nn(Bn,r) 


log P(X £ B n ,r) 


- 1 


= 0 a.s. T> 1. 


(5.16) 


The proof can be found in Subsection |8.2| 

The constraints on (kn), £ and d ensure that np n (Qn(Y n /£ n (B n ,T)) £ B U:T ) is 
eventually bounded by powers of n with exponents in (0,1). This does not seem re¬ 
strictive for applications. 

In practice, based on a few trial values of £ n (B) which give “acceptable” numbers 
of np n (Qn(Y n /£ n (B)) £ B), one could check the stability of nH(B) with respect to 
ripn(Qn(Yn/£n(B)) £ B). 




























16 


Cees de Valk 


6 Numerical examples 


First, we will discuss simulations, considering the case of a bivariate normal random 
vector U with standard normal marginals and correlation coefficient p = 0.5. We are 
not yet concerned with marginal estimation, so for A', we take the random vector 
with standard exponential marginals obtained from U by marginal transformations; 
therefore, X = Y in this case. 

As extreme events, we will consider halfspaces, i.e., U £ {* £ R 2 : 01*1 + 02*2 > c} 
for some a £ R 2 and c > 0; their probabilities are easily calculated. In terms of X, 
these events are represented by A ’ £ B with 

B = {* £ [0, oo) m : a 1 <p-\l-e- Xl )+a 2 <P~ 1 (l-e- X2 ) > c}. (6-1) 


Experiments were performed with 02 = 1 and with several different values of ai, 
with c in each case chosen to ensure that P( X £ B) = 4 • 10 -8 . In all experiments, 
n = 5000, and the estimator (5.10l was applied with £ = 1 and k n = 20. 

Our first case concerns 01 = 0.5; B is shown as a grey patch in Figure [6] 1. Figure 
[6] 1 (a) shows Y n , which has no datapoint in B. The stretched data cloud Y n /l^\B) is 
sho wn in Figure[6]l(b), with k n = 20 datapoints in B\ ln{B) equals 0.334. According 
to ( |5.10| , the probability of B is estimated as (20/5000) 1 /°.334 = 6.6 ■ 10 -8 . 

To appreciate how this estimator differs from the classical approach, an estimator 
similar to ( |5.10| ) but based on the classical multivariate tail limit ( |1.5| ) was applied as 
well: kn(B) := (k n /n)e^ Xn( ' B ' > with A n(B) := inf{Z > 0 : p n {Y n + l) 6 B) > k n /n}. 
It is similar to the estimator considered in [deT Haan _fc Sinha| ( |1999| ) and |Drees fc 


de Haan 


(20131 without marginal estimation. Figure W c) shows Y n + A n (B) with 
A n(B) = 8.92; the corresponding probability estimate equals 5.4-10~ '. The qualitative 
difference between Figures [6] 1(b) and [6] 1(c) is striking. 

Figure[6]2 summarises the results of simulations with a± = 1, 0.5, 0.1, 0, —0.1, —0.5 
and &2 = 1, in each case with 500 realisations: (a) shows the boundaries of the events 
considered, labelled by 01 ; (b) shows the root mean square errors (RMSE) of the 
logarithms of the probability estimates, and (c) shows the bias of the logarithms of 
the probability estimates. For the bivariate normal with p < 1, the limiting measure 
in (1.51 is concentrated on the boundaries, so the classical estimator is not expected 
to do well in this case. Therefore, in addition, a correction of the classical estimator 
based on residual tail dependence ( |4.1| was applied cf. Draisma et al. (2004). 

The results in Figure [6] 2 indicate that the standard deviation is generally small in 
comparison to the bias, despite the small value of k n used. The two classical estimators 
perform better or worse depending on the value of a 1 , but the LDP-based estimator 
(5.101 performs consistently as good as or better than both classical estimators in all 


The probability estimator (5.101 can also be applied to estimate the survival func¬ 
tion. This makes it possible to compare it to the estimator for the survival function 
proposed in Wadsworth fc Tawn] (J2013) (Sections 5.1 and 5.2) based on ( |4.7| with 
inf I{A a ) replaced by /t(a), estimated using an approach employing the Hill estimator. 
With both estimators, the same simulations were carried out as reported in Section 
5.3 of Wadsworth & Tawn (2013): for X considered above, estimates of the survival 
function F c (xi,x 2 ) were made with *2 = 1.51ogn and *i /*2 = 0.05, 0.10,..., 0.50. 
With n = 5000, k n = 20, 500 realisations, and Y n replaced by the exact Y = A' as 
in Wadsworth & Tawn (20131, the RMSE of the logarithm of probability for (5.101 
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Fig. 6.1 Simulation with bivariate normal dependence and e xpon ential marginals (see text). 
From left to right: (a) Y n (dots) and failure event B given by (6.1) (grey); (b) Y n /£n(B) and 
failure event; (c) the classical analogue Y n + X n (B) of (b) (see main text). 



Fig. 6.2 Simulations with bivariate normal depe nden ce and exponential marginals (see text). 
From left to right: (a) boundaries of failure events ( |6.l[ ) labelled by ai, for 02 = 1; (b) RMSE of 
the logarithm of probability as function of a\ for estimator ( |5.10[ ) (circles), its classical analogue 
(diamonds) and its classical analogue accounting for residual tail dependence (squares); (c) bias 
for the estimators as under (b). 


was 11-17% higher than for the estimator from Wadsworth & Tawn (2013), which 
performed similarly to an estimator based on the conditional probability approach of 


Heffernan & Tawn (2004) (see Section 5.3 of Wadsworth &; Tawn (2013)). This is an 
encouraging result for an estimator as simple and widely applicable as frof. 

The final case is a trial application of the estimator (5.151 to an oceanographic 
dataset, in order to estimate the mean fraction of time that wave run-up reaches the 
crest of a fictitious seawall. Figure[6]3 (upper right) shows simultaneous 3-hourly values 
of wave period measured at the offshore site YM6 in the North Sea and surge level 
from the nearby harbour of IJmuiden provided by Rijkswaterstaat, the Netherland^] 
The dataset covers 24 years (n = 70128). For this trial, a strongly simplified version 
of a model from TAW (2002) is used to approximate the run-up height of the 2% 
highest waves on the seawall from wave period and still water level. The set B of wave 
period/water level combinations leading to wave run-up exceeding 15m is indicated by 
the grey area in the same figure. In the model, the mean depth at the seawall base is 
0m and the seawall has a flat smooth 1:4 slope. The RMS wave height at the base is 
approximated by its upper bound from Ruessink et al. (2003). For the water level, we 


5 Wave period is T m _ 1 0 (s); surge is still water level minus estimated astronomical tide (m). 
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Fig. 6.3 Marginal log-GW tail estimates for wave period X i (upper left) and surge level 
X2 (upper middle); sample of X and set B corresponding to wave run-up exceeding 15m 
(upper right); Y n (lower left); Y n /£ n (B ) (lower middle); Qn(Yn/£n(B)) (lower right); fat dots 
indicating points with run-up exceeding 15m. 


use surge data, ignoring the astronomical tide. Dependence on wave direction is ignored 
in marginals and nearshore wave transformation. Because of all these simplifications, 
estimates obtained do not carry concrete relevance to coastal flood safety. 

Quantile estimates for wave period and surge were made using the simple log-GW- 

be a nondecreasing intermedi- 
> k 2 ,n/ log 2 n = 


based quantile estimator from de Valk (2014): let (k 2 , 


ate sequence satisfying that lim sup n _ >00 log £ 2 , 71 / log n < 1 and limn 
00 (with log 2 the iterated logarithm), fix some 1 > 1, and define 


kin • — 


| (n/k 2 , n ) 


for i 6 {0,1}. 


( 6 . 2 ) 


Taking k n = fco,™ hi (5.21, and 


9n,j — 


1 x i 
lo S2 XT 


- log; 


2 -V, 


log; 


and 


1 x '> 


9n,j — 


t - — fc 0,n+ 1:1 

h L , (0 


(6-3) 

(5.51 is ensured by Theorem 4 of de Valk (20141. Like the Pickands (19751 estimator 


for the extreme value index, the estimator 9 n is based on only three order statistics. 
However, its behaviour is entirely different, because the spacings between (fco.n), (fci,n) 
and {k 2 ,n) are different. 

Quantile estimates ( |5.2[ ) with ( |6.2| - j[6.3[ ), t = 2 and k n = 5009 are shown in the 
upper left and middle panels of Figure |6|3. The lower left panel shows Y n , the lower 
middle panel shows Y n /l n (B ) with £ n {B) = 1/2.13; Qn(Y n /£ n (B)) is shown in the 
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lower right pa nel. In this case, p n (Qn(Y n /ln{B)) £ B ) 


the estimator (5.15l. 
years. 


™n(B) = (41 /n) 


2.13 _ 


= 1.3 • 10 


-7 


= 41/n = 5.85 • 10 4 , so for 
, about 11 hours per 10,000 


Contrary to the assumptions made earlier, the 3-hourly surge and wave period are 
serially dependent. Since we are estimating a fraction of time, serial dependence does 
not need to invalidate the estimate; its principal effect is that the estimate is less precise 
than it would have been if the process were iid. Imposing a minimum separation of 24 
hours between storm events, the 41 datapoints moved into B in Figure[6]3 (lower right) 
represent 18 distinct events, giving a mean duration per event of 6.8 hours. Using this 
value, the estimate nl! (B) can be converted to an estimate of the frequency of wave 
run-up exceeding 15m; its value is 1.7 ■ 10 -4 per year. Evidently, this unconventional, 
but intuitively appealing variation of the peaks-over-threshold approach would need 
formal underpinning by a model of serial dependence in order to be taken seriously. 


7 Discussion 


Like similar methods in the classical setting ( e.g. de Haan & Sinha (1 999} ; Drees 
|fe de Haan| ( [20l3| ); |Draisma et al.| ( 12004] ) ), the estimators (5.101 and 1 5. 15| l exploit 
homogeneity of a function describing tail dependence; in this case, homogeneity (3.41 
of the rate function I. This offers the advantage that no explicit estimate of / is 
required. However, in certain situations, there may be good reasons to estimate I , such 
as if for a given random vector X, probabilities need to be estimated for multiple sets 
in a consistent and reproducible manner. Therefore, estimation of I remains a topic 
deserving elaboration. 

The limitation of A to continuity sets of I in Theorems [4] and [5] is less restrictive 
than it may seem, since the homogeneity of I makes continuity sets rather common, 
as noted in Section [3] The other conditions on A are weak. 

To prove convergence of the estimators under such weak conditions, local uniformity 


in d of convergence in (8.51 is employed, which is derived from uniformity in d of 


convergence in (8.121. The latter also ensures local uniformity in A of convergence in 


(8.51, and therefore local uniformity in t of convergence of the estimators in (5.121 


and (5.16 b In practice, this means that if such an estimator applied to a given dataset 
produces a fair estimate of P(X £ Bq ) for some Bo C R m , then it may also be applied 
with confidence to the same dataset to estimate the probability of B\ C such that 
P(X £ Bi) > P(X £ Bq) t for r > 1 not too large, e.g. r = 2. If P(X £ Bq) <g 1, 
e.g. P(X £ Bo) = 0.01, this amounts to extrapolation over several additional orders of 
magnitude in probability. How far one can extrapolate in practice will depend on the 


rates of convergence to the marginal log-GW tail limits and in (3.11, which will differ 
from case to case. 


Convergence of log-probability ratios as in (5.121 and (5.161 is typical for the prob¬ 


ability range £3• A stronger notion of convergence might be desirable, but would re¬ 
quire restrictive additional assumptions which would be hard to justify in applications. 
Rather, it is recommended to diagnose bias in estimates and take this into account 
in estimates of uncertainty. For this reason, modelling of bias and rate of convergence 
deserves further study. 

Deriving asymptotic error distributions will require additional assumptions beyond 
those for Theorems [4] and [5] and methods quite different from those employed in the 
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present article. Because it is complex (see e.g. Drees & de Haan (20131 for a comparable 
problem), this important topic needs to be left for a follow-up study as well. 

The theory is readily extended from events involving a high value of at least one 
of the variables to events extreme “in any direction”, by replacing the exponential 
distribution as standard marginal by the Laplace distribution cf. Keef et al (2013). 
Other choices of standard marginal are also possible, with minor adaptations to theory 
and estimator. 

Furthermore, the main results of this article can be generalised straightforwardly 
from a random vector in R m to a random element of Cb(K), the continuous functions on 
a compact metric space K. Classical multivariate extreme value theory and estimation 
have been generalised to this setting earlier; see e.g. de Haan & Lin ( 2001|), P art III 
of de Haan & Ferreira (20061, Einmahl & Lin (20061 and Ferreira & de Haan (20141. 
For the theory presented here, the main difference between the R m setting and the 
Cb(K) setting is that in the latter, exponential tightness of { P(Y/y £ ■), y > 0} no 
longer follows from the exponential marginals; it is an independent assumption. In loose 
terms, it entails that all but an exponentially small probability mass is concentrated 
on equicontinuous sets of functions in Cb(K) (see e.g. Dembo & Zeitouni 19981). 


8 Proofs and lemmas 

8.1 Proof of Theorem [2] and Corollary [I] 

Convergence in ( |2.6[ ) is locally uniform in A (e.g. |de Haan fe F erreira] ( [2006 ]), B.1.4 and 
B.2.9), so 


lim sup max 

2/->°o AG[d-i,yl]fe{l,..,m} 


-1 / loggjQA) -loggjfa) ^ _ 




9j(y) 


= 0 VT > 1. (8.1) 


For every y > 0, Q y is injective, so we can define the random vector 
Yy := Qy'iX) = Qy 1 Q(Y) as. 


with Y defined by (3.111. By ( |8.1| ), there exists almost surely for every A > 1 and 
8 > 0 some yA.S > 0 such that for all y > yA.S, \\Y y — 5'’|| oo > 8y implies HPH^ 
Therefore, by (3.51, since A > 1 is arbitrary, 


lim sup - log P( || Y y — Y | 

y—> oo V 


> Sy) <-A \/A> 0. 


( 8 . 2 ) 

1 and 
> Ay. 

(8.3) 


By Theorem [TJ the distribution functions of {Y/y, y > 0} satisfy the LDP (3.11 
with good rate function /, so (8.31 im plies th e same for the dist ribution fun ctions 
of {Yy/y, y > 0}; see Theorem 4.2.13 of Dembo & Zeitouni (1998). Therefore, (3.151 


follows from ( |8.2| ). To prove (b), note that by (3.151 and (3.71, 

lim -? 1 log ^1 — Fj (qj(y)e 9:i ^ he i^^ = —A VA > 1 



proof of the theorem. 
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For the Corollary, note that for A C [l,oo) m , P(X £ Q y (yA )) in (3.151 is equal to 
Ylog.Xi-loggi(y) log X m — log q m (y)\ ^ u , A ^\ 

sT5)-""-i(S)- )<= H >W) 

by I ,13| . Therefore, by the contraction principle (see Theorem 4.2.1 in Dembo & 


Zeitouni (1998)), (3.161 follows from (3.151. 


3.2 Proof of Theorem 0 


For convenience, the following shorthand notation will be used: 

Pn,l(A) '■= Pn(Qn(Y n /l) £ Q(y n A)), (8.4) 

and l n (A ) := £n(Q(ynA)), l^(A) := in (Q(l/nA)), l n (A) := £„ ( Q(ynA )). 

Proof By Theorem [l] Y defined by ( 3.1 1[ ) satisfies the LDP ( |3.1| ) with good rate func¬ 
tion I. As £ £ (0,(1 — c/) -1 ) with c as in (5.11l, take any A £ (£/inf J(A), (1 — 
c!)^ 1 / inf 1(A)). Fixing an arbitrary A > 1, then by Lemma |4j for every <5 € (0, A) 
(see (|8.4|)), 


lint sup 

n->°o Ae[-A-i,A], 


\yn 1 ^og(l nd / x (AX) + dinf 1(A) =0 a.s. 


and 


lirnsup sup t/ n 1 \og(L nd / x (AX) < — zlinf 1(A) < — ( a.s. 

n-toc A £[A~ 1 ,A\,d>A 


Choosing S < d/inf 1(A), since A > £/inf 1(A), we observe that 

<£ if d £ [<5, H inf 7(A)] c [8, A] 

>£ if d e (£/ inf I (A), A\ C [<5, A] 


dinf 1(A) 


in 18.51. Therefore, with ( 8 . 61 , using (|5.9|), 


lint sup |A1)1~(AA) — 5/inf J(A)| = 0 a.s. 

n->oo Ae[yl-i,A] 


and similarly, using (5.131, we find that 

lim sup \xin (AA) — d/ inf 1(A) I = 0 a.s. 
n ^°° Ae[A-!,A] I I 

By f8T}, <(878j|, |5A4} and jOJ, 

lim sup Ivn 1 In 1 (AX) log p, n it ax) (AA) + A inf /(A) = 0 a.s. 

n_>00 XelA-i.A] 1 


or equivalently, by (5.15), 


lim sup \y n 1 log Ttn(Q(yn AX)) + Ainf 1(A) = 0 a.s. 

AG [A- 1 , A] 1 1 


(8.5) 

( 8 . 6 ) 


(8.7) 


( 8 . 8 ) 


(8.9) 


( 8 . 10 ) 


Since (3.121 holds with inf I(A°) = inf 1(A) and inf I (A) > 0, by (3.41 and (8.101, 

log n^(Q(y n AX)) 


lim sup 

n->oo Xe [ A -l tA ] 


log P(X £ Q(y n AX)) 


- 1 


= 0 a.s., 


and (5.161 follows from (5.111, since A > 1 is arbitrary. □ 
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8.3 Proof of Theorem [I] 

Following the proof of Theorem[5]in Subsection 8.2 (8.71 and (5.101 yield 
lim sup log^(Q(i/ n AA)) + Ainf/(>!) =0 

n_>0 ° Aepl-LA] 1 1 

and the result (5.121 follows as in the proof of Theorem [ 5 ] 


8.4 Lemmas 


Lemma 1 Let Y be a ra ndom vector in [0, oo) m with standard exponential marginals 
satisfying the LDP 3.1) with good rate function I, and y( 1 ) ; y( 2 ) ; ... a sequence of 
iid copies of Y. Let the Borel set A C [0, oo) m be a continuity set of I satisfying 
inf 1(A) £ (0, 00 ). If (y n > 0) and A > 0 satisfy limn-^oo y n = 00 and 


A < lim inf 

n—too 


log n 

y n inf I(A) 


< 00 , 


( 8 . 11 ) 


then with p n defined by ( 5.8\ ), 


lim sup \y n 1 logp n (y £ dAy n ) + dinf /(A)I —> 0 a.s. \/S £ (0, A) (8.12) 
n ^°° de[<5,zA] 


and 


lim sup sup yn 1 \ogp n (Y £ dAy n ) < —Ziinfl(Al) a.s. 

n—>00 d>A 


(8.13) 


Proof Let A := U^>i(AA); by (3.41 , A is a continuity set of I satisfying inf 1(A) = 
inf 1(A) < 00 . Define the random variable 

v := inf{«; > 0 : Yw £ A} (8-14) 

with inf{0} := 00 , and let G be its distribution function. Since U^> 1 (^IA) C A, 


Y £ A°z => v < z 


■ y £ Az for every z > 0, so by (3.11 and (3.41, 
lim y- 1 \ogG(w/y) = — w^ 1 inf 7(^4) Vw > 0. 

y—> 00 


(8.15) 


Therefore, since inf/(.4) £ (0,oo), —logG(l/Id) £ JiV/n, so by Bingham et al. 
(19871 (Theorem 1.5.2) and (|8.15[) again, for every a > 0, 


lim suply 1 logG(w/y) + w 1 inf I(A)\ = 0. 

y^°°w>a 


(8.16) 


By (8.151, there is for every e > 0 an n e £ N such that for all n>n e , 


nG(a/y n ) > e ^osn-^+a 1 inf I(A))y n 


(8.17) 


Taking a = 1/A, then by (8.111, £ > 0 can be chosen small enough that the 
exponent in ( 8.17|l eventually exceeds elogn. Therefore, 


lim nG(a/y n )/logn = 00 . 

n—> 00 


(8.18) 
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With G _1 the left-continuous inverse of G, almost surely v ^ = G _1 (W^) for all 
i 6 N, with ... independent and uniformly distributed on (0,1), so almost 

surely (see def. (5.8 1 ), p n (v < w/y n ) = Pn(U < G(w/y n )) for all n £ N and all w > a. 
Therefore, by Wellner ( 1978| (Corollary 1) and ( |8.18| >, 

lim sup|logp n (t; < w/y n ) - \ogG(w/y n )\ = 0 a.s. (8.19) 

n—too w > a 

and since v < w/y n =>Y £ Ay n /(wl) => v < wl/y n for all l > 1 and w > 0, using 
(8.161 and (3.41, as a = 1/A, 

lim sup [y// 1 logp n (Y £ dAy n ) + din!I(A)\ = 0 a.s. 
rwoo de(o,A] 

Therefore, as A C A and inf 1(A) = inf 1(A), 

limsup sup y// 1 logp n (Y £ dAy n ) + dinf 1(A) < 0 a.s. 

ra—too de(0,A] 


( 8 . 20 ) 


( 8 . 21 ) 


A is a continuity set of / and I satisfies (3.41, so there is for every e > 0 a point 
x e £ A° such that I (x e ) < inf I ( A)- |-e . Let e > 0 and y > 1 be such that Ay( inf I (A) T 
e) < liminfn-^oo yn 1 logn (see (8.111). Then for y sufficiently close to 1, an open set 
B C [0, oo) m can be constructed such that 


U A >! (\B) C B, x e £ B \ (By) C .4°, and 
inf I(B°) = inf 1(B) £ (inf 1(A), I(x e )\ 


( 8 . 22 ) 


as follows. The first two requirements on B are satisfied by B' = U^>!(A17) for some 
sufficiently small neighbourhood U C A° of x e , with y > 1 close enough to 1. If B' 
is a continuity set of I, then set B = B'. Else, consider the function / : [0,oo) m x 
[0,1] ->■ [0,oo) m defined by f(y,a) := ay + (1 - a)(|| 2 /|| cx> / HxelloJse- It satisfies 
f(B', 1) = B', f(B', 0) = S'nU A>0 (Aa; e ), and f(B',a) C }(B',a') if a < a'. Therefore, 
a i —y iniI(f(B',a)) is nonincreasing, so with a any of its continuity points in (0,1), 
B = f(B',a) is a continuity set of I and satisfies (8.221. By (8.221, 


Pn(Y £ dAy n ) > p n (Y £ dBy n ) ~ Pn(Y £ dyByn) 
= Pn(Y £ dByn)( 1 - 


(8.23) 


and furthermore, ( |8.20| ) continues to hold after substituting B or By for A. Therefore, 
by (3.41, for every 5 £ (0, A) almost surely, the right-hand side of (8.231 is p n (Y £ 
dBy n )(l T o(l)) uniformly in d £ [5, Zl] and furthermore, using (8.221, 


lim inf inf y n 1 log p n (Y £ dAy„.) + dl(x e ) > 0 a.s. 

n—too d.£[8,A\ 


(8.24) 


Now ( 8.121 follows from ( 8.21 1 and ( 8.24 1, because I(x e ) < inf I (A) + e, and e > 0 can 
be chosen arbitrarily close to 0. Finally, by (8.201, as U,\>i(»4A) c A, 

limsup sup yn 1 \ogp n (Y £ dAy n ) < — Zlinf/(„4) a.s. (8.25) 

n—yoo d>A 


and because A C A and inf 1(A) = inf 1(A), (8.13l follows. □ 
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Lemma 2 Let Y be a random vector on [0, oo) m with standard exponential marginals 
and a sequence of iid copies of Y. Define Y^ '■= (7^, /or 

i = 1 ,n with 

Y}% ■■= - log(l - (RV - i)/n). (8.26) 

and Rj ^ := 1 (T^ — Yj^)- For ( y n > 0) satisfying liminf n—> oo yn / log n > 0, 

suplimsupjM 1 logp n ( \\Y n - Y|| > y n e j = -oo a.s. (8.27) 

e>0 n—> oo ' ' 


Proof Since p n (||Yn - y||oo > Vns) < Y!,]LiPn(\Yj,n - Yj I > J/ne), it is sufficient to 
prove (8.271 for the univariate case. 

Let F n and Fff 1 be the empirical distribution function and quantile function of 
U^\ ..,U^ n \ with := exp(—yW) uniformly distributed in ( 0,1) for e very i € N. 


Because sup f g[1/wa1 | t/F n = sup fg , oa1 1 1 r n {t)\ (see e.g. 


Theorem 2 of 


Shorack and Wellner (19781, 


Wellner 


sup \og(t/F n 1 (t))/log 2 n —> 1 a.s. 

t£[l/n,l] 


(19781), by 


(8.28) 


Simi larly, because sup teJ1 / a1 1 1 1 T n 1 (t)| V 1 = sup tg[Ml;ji l] \t/P n (t)\, by Theo¬ 
rem 3 of Shorack and Wellnerj ( |1978| ), 

sup (t 1 Tff 1 (t))/ log 2 n —> 1 a.s. 
te[i/n,i] 


inf log(f/T„ 1 (t))/log 2 n —> 0 a.s. 


(8.29) 


Since Y n -i- |_i :n = — logT„ 1 (i/n) for i = l,..n and y n /log 2 n —» oo, (8.281 and 


(8.291 imply, using (8.261, 


max |Y„_i+i :n - y„-i+ i:n\/yn ->• 0 a.s. 
i£{l,..,n} 


(8.30) 


As a consequence, there is almost surely for every <5 > 0 an n$ £ N such that for all 
e > 5, Pn(\Y — Y n | > y n e) = 0 for all n > ng and therefore, that yff 1 logp n (|y — Y n \ > 
y n c) = —oo, proving the univariate case of (8.271. □ 


Lemma 3 Let X be a random vector on M m having continuous marginals satisfying 
log-GW tail limits, and le t .... b e a sequence of ii d copie s of X. With Q, Q n 

and Y n defined by . 3.10), (5.6) and (5.1), let ( k n ) satisfy (5.11) and qj n defined by 
\5.2\ satisfy (5.5| ) for j = 1,.., m, with y n defined by ( 5.3\ ). Then for every <5 > 0 and 
e > 0, 


lim supj/ n 1 logjSn ( (IQ 1 Q n (Y n l 1 )-Y n l 
l - > °° i>6 v 


> y n s) =—oo a.s. (8.31) 


Proof Fix e > 0 and 5 > 0. As in Lemma [2j we only need to prove (8.311 for the 
univariate case, so we proceed with this. Note that (8.311 holds if an ng e £ N exists 
such that (suppressing the labels of vector components in the univariate case) 


sup sup \q 1 q n (Yj. n l 1 ) — Yj. n l 1 \<y n £ V n>n Se 
l>8 


(8.32) 
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Fixing A > max(l, 5 1 )/(1 — c) > max(l, 5 1 ) lim sup^^^ \og(2n)/y n with c as 

(8.33) 


in (5.111, (8.321 holds for some ng e £ N if 

\q~ 1 q n {z) — z\< y n e Vn > ng t 


sup 

*e[0,y„A] 


which is true if v n dehned by (15.41) satishes 


and also 


SUp \Vn(z)\ 

ze[VrL,ynA] 


0 


sup |log(t/T„ \t))\/yn 

te[e-» »,1] 


0 , 


(8.34) 


(8.35) 


with P(( 1 the empirical quantile function of U O, .. ,U^ as in the proof of Lemma [ 2 ] 
(note that q~ 1 q n (z) = — log rfi 1 (e _z ) for all z £ [0,?/n]). As in the proof of Lemma 
[ 2 J ( |8.28 1 and ( |8.29 ) hold. Therefore, since the upper bound in (5.11l implies that 
liminf n _).oo Vn/ log n > 0, (8.351 holds almost surely. Moreover, by (5.5 I, (8.341 holds 
almost surely. This proves the univariate case. □ 

Lemma 4 Let the random, vector X on R m have continuous marginals satisfying log- 
GW tail limits and let Y defined by (3.11) satisfy the LDP fU with goo d rate function 
I. Let X (1) ,X (2) ,... be a sequence of iid copies of X. Let (k n ) satisfy (5.Ill) and let 
the quantile estimator qj^ n given by \5(, 1) satisfy (5.5). Let the Borel set A C [0, oo) m 
be a continuity set of I satisfying inf 1(A) £ (0,oo). Then p defined by ( 8 . 4 ) satisfies 
for every A > 1 and every 


and 8 £ (0, A): 


lim sup 

’ Ae[A-M], de[a,zi] 


and 


lim sup sup 

n.-*-OO Ae[A _1 ,A], d>A 


(°> (1-c') inf 1(A)/ 
yn 1 logAn,d/A(AA) + dinf/(A)| =0 a.s. 
Vn , 1 log Pn, d /\{A\) < —A inf 1(A) a.s. 


(8.36) 


(8.37) 


(8.38) 


Proof With {prop} denoting the subset of 12 satisfying the proposition prop, consider 

c al, n ■■= {sup J > o || 0 - 1 < 5 „(^ <) r 1 ) - yWr 1 !^ > y n b}. (8.39) 


for i = 1, ..,n, which are elements of T n . Following (5.8 1 , we can define empirical 
probabilities Pn(C a ,b,n) '■— n ~ 1 ^'ie{i n }l(C^*£ n ). Combining Lemmas j^J and jsj gives 

lim yf } 1 logp n (C a b n ) = -oo a.s. Va, b > 0. (8.40) 

n—t oo 

For every S C R m and l > 0, let S 1, := {x £ R m : inf^/gs||x — x' || < t} (closed), 

and S~ L := {x £ R m : inf-pg gc ||x — a;'11> t} (o pen). Set S° := S. Since I is a good 
rate function, Lemma 4.1.6 of Dembo & Zeitouni (19981 implies 

lim inf I(A L ) = inf I[A) = inf I{A°) = inf /(U t >oA -1 ) = lim inf I(A~ L ), (8.41) 
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so the nonincreasing function 1 1 -» inf I(A L ) is continuous in (—to, to) for some to > 0, 
and therefore, A L is a continuity set of I for every t £ (—to, to)- Moreover, by (8.361, 
there exist e > 0 and ti £ (0, to) such that inf I(A~ L ) < inf 1(A) + zi _1 e < A~ (1 — 
c'r 1 for all t € [0, ti]. Therefore, for 


E dl,n ■= i Yil) e d VnA L } t- 1 ,-,n, 

Lemma [l] implies for A satisfying (|8.36 1 and every S £ (0, A) that 


(8.42) 


lim sup \y n 1 \ogp n .(E d , n ) J rdir£I(A L )\=Q a.s. Vt£[-ti,ti]. (8.43) 
n ^°°de[S,A\ 

Therefore, 

liminf inf y~ x logp n (E d t „) > -(1 - c') _1 Vt £ [0, ti] a.s. (8.44) 

ra-s-oo de[5,4] 


Let 


D fd n ■= {Qn(Y^\/d) £ Q(y n AX)} eE n , i = 1,.., n. 


(8.45) 


By (8.421, (8.391 and (8.451, we have for all d > S, X £ [A 1 ,A\ and t > 0 
that E d,-L,n n ( C d/\,L\,n) C C D \A,n and therefore, Pn(D X ,d,n) > Pn(E d _ hn ) - 
Pn(C S /A,i,/A,n)- Therefore, for all t £ (0, ti], 

liminf inf y ~ 1 logp„(Z? A d „) - y„ 1 logPn(S d t „) 

n->oo dG[<5,21], Aefo- 1 ,/!] 


> 


> 0 a.s., (8.46) 


liminf y- 1 log f 1 - e lo sMC e/ A,o/A,n)- inf d£[s ,4] 

n—>-oo V / 

the last inequality following from ( |8.40| ) and ( |8.44[ ). Therefore, by ( |8.43[ ) and ( |8.41| ), 

(8.47) 


liminf inf y n logp n (D x dn ) + dinf 1(A) > 0 a.s. 

n->oo de[6,A], AGpl-M] 


For all d > <5, A £ [A 1 ,A] and t > 0, we have D^ d n n {C'sj A , l /A,n) C c E d,l,n> so 


-<(*) 


?( l ) 


D 


(0 


X.d 


c sy „ u c 


(i) 


and 


d,i,n &/A,l/A, r, 

Pn(D X , d ,n) < ‘^ m ^ x .(pn(Cg/ AtL / A n ),Pn(E d b n )). 
Therefore, by ( |8.40| ) , ( |8.43 1 and ( |8.41|) , 

limsup sup y^ 1 logpn(T>A,d,™) + dinf 1(A) < 0 a.s., 

n->oo d £[6,A], AGfo- 1 ,/!] 

so with (8.471 and (8.41, ( 8.37| ) is obtained. By Lemmajl] with to as above, 
limsup sup 2M 1 logPn(.Ey,t,n) <-A inf I (A 1 ) a.s. Vt £ [0, to)- 

n—>oo d>A 


(8.48) 

(8.49) 

(8.50) 


Then (8.381 follows from (8.48) using (8.40), (8.50) and (8.41). □ 
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